Goto

Collaborating Authors

 emotional content


Emovectors: assessing emotional content in jazz improvisations for creativity evaluation

Jordanous, Anna

arXiv.org Artificial Intelligence

Music improvisation is fascinating to study, being essentially a live demonstration of a creative process. In jazz, musicians often improvise across predefined chord progressions (leadsheets). How do we assess the creativity of jazz improvisations? And can we capture this in automated metrics for creativity for current LLM-based generative systems? Demonstration of emotional involvement is closely linked with creativity in improvisation. Analysing musical audio, can we detect emotional involvement? This study hypothesises that if an improvisation contains more evidence of emotion-laden content, it is more likely to be recognised as creative. An embeddings-based method is proposed for capturing the emotional content in musical improvisations, using a psychologically-grounded classification of musical characteristics associated with emotions. Resulting 'emovectors' are analysed to test the above hypothesis, comparing across multiple improvisations. Capturing emotional content in this quantifiable way can contribute towards new metrics for creativity evaluation that can be applied at scale.


Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols

Padó, Sebastian, Thomas, Kerstin

arXiv.org Artificial Intelligence

Emotions are a fundamental aspect of artistic expression. Due to their abstract nature, there is a broad spectrum of emotion realization in artworks. These are subject to historical change and their analysis requires expertise in art history. In this article, we investigate which aspects of emotional expression can be detected by current (2025) vision language models (VLMs). We present a case study of three VLMs (Llava-Llama and two Qwen models) in which we ask these models four sets of questions of increasing complexity about artworks (general content, emotional content, expression of emotions, and emotion symbols) and carry out a qualitative expert evaluation. We find that the VLMs recognize the content of the images surprisingly well and often also which emotions they depict and how they are expressed. The models perform best for concrete images but fail for highly abstract or highly symbolic images. Reliable recognition of symbols remains fundamentally difficult. Furthermore, the models continue to exhibit the well-known LLM weakness of providing inconsistent answers to related questions.


A Linguistic Analysis of Spontaneous Thoughts: Investigating Experiences of Déjà Vu, Unexpected Thoughts, and Involuntary Autobiographical Memories

Venkatesha, Videep, Poulos, Mary Cati, Steadman, Christopher, Mills, Caitlin, Cleary, Anne M., Blanchard, Nathaniel

arXiv.org Artificial Intelligence

The onset of spontaneous thoughts are reflective of dynamic interactions between cognition, emotion, and attention. Typically, these experiences are studied through subjective appraisals that focus on their triggers, phenomenology, and emotional salience. In this work, we use linguistic signatures to investigate D ej ` a Vu, Involuntary Autobiographical Memories, and Unexpected Thoughts. Specifically, we analyze the inherent characteristics of the linguistic patterns in participant generated descriptions of these thought types. We show how, by positioning language as a window into spontaneous cognition, existing theories on these attentional states can be updated and reaffirmed. Our findings align with prior research, reinforcing that D ej ` a Vu is a metacognitive experience characterized by abstract and spatial language, Involuntary Autobiographical Memories are rich in personal and emotionally significant detail, and Unexpected Thoughts are marked by unpredictability and cognitive disruption. This work is demonstrative of languages' potential to reveal deeper insights into how internal spontaneous cognitive states manifest through expression.


Human Variability vs. Machine Consistency: A Linguistic Analysis of Texts Generated by Humans and Large Language Models

Zanotto, Sergio E., Aroyehun, Segun

arXiv.org Artificial Intelligence

The rapid advancements in large language models (LLMs) have significantly improved their ability to generate natural language, making texts generated by LLMs increasingly indistinguishable from human-written texts. Recent research has predominantly focused on using LLMs to classify text as either human-written or machine-generated. In our study, we adopt a different approach by profiling texts spanning four domains based on 250 distinct linguistic features. We select the M4 dataset from the Subtask B of SemEval 2024 Task 8. We automatically calculate various linguistic features with the LFTK tool and additionally measure the average syntactic depth, semantic similarity, and emotional content for each document. We then apply a two-dimensional PCA reduction to all the calculated features. Our analyses reveal significant differences between human-written texts and those generated by LLMs, particularly in the variability of these features, which we find to be considerably higher in human-written texts. This discrepancy is especially evident in text genres with less rigid linguistic style constraints. Our findings indicate that humans write texts that are less cognitively demanding, with higher semantic content, and richer emotional content compared to texts generated by LLMs. These insights underscore the need for incorporating meaningful linguistic features to enhance the understanding of textual outputs of LLMs.


Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach

Abdalla, Adel N., Osborne, Jared, Andonie, Razvan

arXiv.org Artificial Intelligence

In recent years, the fields of Music Information Retrieval (MIR) and Music Emotion Recognition (MER) have received significant attention, leading to multiple advances in how music is analyzed [1, 2]. These developments have increased the accuracy in determining what emotions are present in a given music sample, but the current state of the art is only now passing 75% through the use of Random Forest and Support Vector Machine models [3]. This is in contrast to the field of speech recognition, where current models are approaching 100% accuracy across hundreds of languages for word identification [4] and 85% for standard speech emotion recognition [5]. The additional challenges in music recognition come from the nature of music itself as the lyrical and emotional content of a vocalist's contribution are only one part of the whole. Tempo, rhythm, timbre, instrumentation choice, perceived genre, and other factors contribute together to shape the emotional and tonal landscape of any given work into a unique blend that is interpreted subjectively by individual listeners [6]. The goal of our paper is to show that by changing the underlying structure of a small subset of musical features of any given musical piece, we can adjust the perceived emotional content of the work towards a specific desired emotion.


Controlling Emotion in Text-to-Speech with Natural Language Prompts

Bott, Thomas, Lux, Florian, Vu, Ngoc Thang

arXiv.org Artificial Intelligence

In recent years, prompting has quickly become one of the standard ways of steering the outputs of generative machine learning models, due to its intuitive use of natural language. In this work, we propose a system conditioned on embeddings derived from an emotionally rich text that serves as prompt. Thereby, a joint representation of speaker and prompt embeddings is integrated at several points within a transformer-based architecture. Our approach is trained on merged emotional speech and text datasets and varies prompts in each training iteration to increase the generalization capabilities of the model. Objective and subjective evaluation results demonstrate the ability of the conditioned synthesis system to accurately transfer the emotions present in a prompt to speech. At the same time, precise tractability of speaker identities as well as overall high speech quality and intelligibility are maintained.


Researchers build AI-driven sarcasm detector

The Guardian

Never mind that it can pass the bar exam, ace medical tests and read bedtime stories with emotion, artificial intelligence will never match the marvel of the human mind without first mastering the art of sarcasm. But that art, it seems, may be next on the list of the technology's dizzying capabilities. Researchers in the Netherlands have built an AI-driven sarcasm detector that can spot when the lowest form of wit, and the highest form of intelligence, is being deployed. "We are able to recognise sarcasm in a reliable way, and we're eager to grow that," said Matt Coler at the University of Groningen's speech technology lab. "We want to see how far we can push it."


Evaluating Emotional Nuances in Dialogue Summarization

Zhou, Yongxin, Ringeval, Fabien, Portet, François

arXiv.org Artificial Intelligence

Automatic dialogue summarization is a well-established task that aims to identify the most important content from human conversations to create a short textual summary. Despite recent progress in the field, we show that most of the research has focused on summarizing the factual information, leaving aside the affective content, which can yet convey useful information to analyse, monitor, or support human interactions. In this paper, we propose and evaluate a set of measures $PEmo$, to quantify how much emotion is preserved in dialog summaries. Results show that, summarization models of the state-of-the-art do not preserve well the emotional content in the summaries. We also show that by reducing the training set to only emotional dialogues, the emotional content is better preserved in the generated summaries, while conserving the most salient factual information.


How Cloud and AI Are Driving Forward Customer Engagement and Service in Contact Centers

#artificialintelligence

The COVID-19 pandemic has accelerated digital transformation in contact centers, making the use of cloud technology and AI the norm for someand desired future norm for many. As consumer expectations around service continue to change, and contact centers become the digital hub for a company's customer experience, leaders must utilize cloud and AI to make customer interactions and experiences more personalized and meaningful. For those looking to harness this opportunity, the answer lies with the concentrated use of these technologies to amplify customer service and contact center agent performance. This is what leaders must understand about the ways cloud and AI are changing the contact center and how they can identify moments to enable more meaningful customer engagement and enhance the agent experience with new technology. According to recent research from Deloitte Digital, at the end of 2020, only 32% of surveyed organizations were running contact center technologies in the cloud; now, 75% expect to make the move within the next two years.This is important, as cloud technology is particularly critical to contact centers, as it enables more flexible and iterative adjustment of capabilities, scale and processes for leaders, as well as drives momentum across everything in the contact center, from the core telephony platform to interaction recording to workforce management.


GANtron: Emotional Speech Synthesis with Generative Adversarial Networks

Hortal, Enrique, Alarcia, Rodrigo Brechard

arXiv.org Artificial Intelligence

Speech synthesis is used in a wide variety of industries. Nonetheless, it always sounds flat or robotic. The state of the art methods that allow for prosody control are very cumbersome to use and do not allow easy tuning. To tackle some of these drawbacks, in this work we target the implementation of a text-to-speech model where the inferred speech can be tuned with the desired emotions. To do so, we use Generative Adversarial Networks (GANs) together with a sequence-to-sequence model using an attention mechanism. We evaluate four different configurations considering different inputs and training strategies, study them and prove how our best model can generate speech files that lie in the same distribution as the initial training dataset. Additionally, a new strategy to boost the training convergence by applying a guided attention loss is proposed.